perm filename EXPERT[E89,JMC]1 blob
sn#876185 filedate 1989-08-19 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00005 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 %expert[e89,jmc] Expert Systems and Mathemtical Logic (for IAKE)
C00013 00003 \section{What is Common Sense?}
C00021 00004
C00023 00005 \smallskip\centerline{Copyright \copyright\ 1989\ by John McCarthy}
C00024 ENDMK
C⊗;
%expert[e89,jmc] Expert Systems and Mathemtical Logic (for IAKE)
% common.tex[e83,jmc]
% thomas[f88,jmc],
% final printed version in thomas[s89,jmc]
\input memo.tex[let,jmc]
\title{Expert Systems and Mathemtical Logic}
%This page taken with slight modifications from common.tex[e83,jmc].
An {\it expert system} is a computer program intended to
embody the knowledge and ability of an expert in a certain
domain. Some of the ideas behind them and several examples have
been described in other lectures in this symposium. Their
performance in their specialized domains are often very
impressive. Nevertheless, hardly any of them have certain {\it
common sense} knowledge and ability possessed by any
non-feeble-minded human. This lack makes them ``brittle''. By
this is meant that they are difficult to extend beyond the scope
originally contemplated by their designers, and they usually
don't recognize their own limitations. Many important
applications will require common sense abilities. The object of
this lecture is to describe common sense abilities and the
problems that require them.
Common sense facts and methods are only very partially
understood today, and extending this understanding is the key
problem facing artificial intelligence.
This isn't exactly a new point of view. I have been
advocating ``Computer Programs with Common Sense'' since I wrote
a paper with that title in 1958. Studying common sense
capability has sometimes been popular and sometimes unpopular
among AI researchers. At present it's popular, perhaps because
new AI knowledge offers new hope of progress. Certainly AI
researchers today know a lot more about what common sense is than
I knew in 1958 --- or in 1969 when I wrote another paper on the
subject. However, expressing common sense knowledge in formal
terms has proved very difficult, and the number of scientists
working in the area is still far too small.
One of the best known expert systems is
Mycin (Shortliffe 1976; Davis, Buchanan and Shortliffe 1977),
a program for advising physicians on treating bacterial
infections of the blood and meningitis.
It does reasonably well without common sense, provided
the user has common sense and understands the program's limitations.
Mycin conducts a question and answer dialog.
After asking basic facts about the patient such
as name, sex and age, Mycin asks about suspected bacterial
organisms, suspected sites of infection, the presence of specific
symptoms (e.g. fever, headache) relevant to diagnosis, the outcome
of laboratory tests, and some others. It then recommends a certain
course of antibiotics. While the dialog
is in English, Mycin avoids having to understand freely written
English by controlling the dialog. It outputs sentences, but
the user types only single words or standard phrases. Its major
innovations over many previous expert systems were that it
uses measures of uncertainty (not probabilities) for its
diagnoses and the fact that it is prepared to explain its
reasoning to the physician, so he can decide whether to
accept it.
Our discussion of Mycin begins with its {\it ontology}.
The ontology of a program is the set of entities that its
variables range over. Essentially this is what it can have
information about.
Mycin's ontology includes bacteria, symptoms, tests,
possible sites of infection, antibiotics and treatments.
Doctors, hospitals, illness and death are absent. Even patients
are not really part of the ontology, although Mycin asks for many
facts about the specific patient. This is because patients
aren't values of variables, and Mycin never compares the
infections of two different patients. It would therefore be
difficult to modify Mycin to learn from its experience.
Mycin's program, written in a general scheme called Emycin,
is a so-called {\it production system}. A production system is a collection
of rules, each of which has two parts --- a pattern part and an
action part. When a rule is activated, Mycin tests whether the
pattern part matches the database. If so this results in the
variables in the pattern being matched to whatever entities are
required for the match of the database. If not the pattern
fails and Mycin tries another. If the match is successful, then
Mycin performs the action part of the pattern using the values
of the variables determined by the pattern part.
The whole process of questioning and recommending is built up
out of productions.
The production formalism turned out to be suitable
for representing a large amount of information about the
diagnosis and treatment of bacterial infections. When Mycin
is used in its intended manner it scores better than
medical students or interns or practicing physicians and
on a par with experts in bacterial diseases when the latter
are asked to perform in the same way. However, Mycin
has not been put into production use, and the reasons given
by experts in the area varied when I asked whether it would
be appropriate to sell Mycin cassettes to doctors wanting
to put it on their micro-computers.
Some said it would be ok if there were a means of keeping
Mycin's database current with new discoveries in the field,
i.e. with new tests, new theories, new diagnoses and new
antibiotics. For example, Mycin would have to be told
about Legionnaire's disease and the associated {\it Legionnella}
bacteria
which became understood only
after Mycin was finished. (Mycin is very stubborn about
new bacteria, and simply replies ``unrecognized response'')
Others say that Mycin is not even close to usable except
experimentally, because
it doesn't know its own limitations.
I suppose this is partly
a question of whether the doctor using Mycin is trusted
to understand the documentation about its limitations.
Programmer's always develop the idea that the
users of their programs are idiots, so the opinion
that doctors aren't smart enough not to be misled by Mycin's limitations
may be at least partly a consequence of this ideology.
An example of Mycin not knowing its limitations can
be excited by telling Mycin that the patient has {\it Cholerae Vibrio}
in his intestines. Mycin will cheerfully recommend two weeks of
tetracycline and nothing else.
Presumably this would indeed kill the bacteria, but
most likely the patient will be dead of cholera long before that.
However, the physician will presumably know that the diarrhea has
to be treated and look elsewhere for how to do it.
On the other hand it may be really true that some measure
of common sense is required for usefulness even in this narrow
domain. We'll list some areas of common sense knowledge
and reasoning ability and also apply the criteria to
Mycin and other hypothetical programs operating in Mycin's domain.
\section{What is Common Sense?}
%This page new.
Our discussion of common sense involves two aspects. First
we discuss what we call ``the common sense informatic situation'',
and second we discuss specific domains of common sense knowledge
and reasoning.
The common sense informatic situation differs from that
of all kinds of formal scientific theories in at least three ways.
1. The construction of a scientific theory involves a
decision as to what phenomena to take into account. After that
rules for the interaction of these phenomena are
decided on. Often these rules take a mathematical form, but
this isn't the main distinction from common sense. If new
phenomena become apparent, the theory must be modified from
the outside. The theory can't accept new phenomena within
itself.
The common sense informatic situation is open to
new phenomena. Suppose on my way home on my bicycle I
unexpectedly encounter a herd of sheep on the road. This
has never happened to me on the Stanford University Campus
and probably never will. Maybe I will take another road,
maybe I will ride or walk my bicycle through the herd, and
maybe I will communicate with the sheep herder, policeman
or animal control officer about when the sheep will be out
of the way. A formal model of bicycling on the Stanford
Campus, e.g. for a robot bicyclist would not take sheep
into account. Instead it would have to do as I would do---
appeal to general common sense knowledge of animal and
human behavior and the institutions of police.
The closest branch science or engineering to AI is
operations research, because it undertakes to study problems from
any area of human endeavor and determine optimal behavior. Lets
compare it with common sense knowledge and ability. One of its
first applications, during World War II, was optimizing the
American and British airplane search for German submarines. The
analysis and the subsequent change in search strategy
substantially increased the number destroyed. The analysis took
into account when and where submarines were most likely to
surface and also facts about at what distances and from what
altitudes they could be detected under various weather
conditions. The methodology involved the operations researcher
deciding what facts to take into account and making a
mathematical model. Once this was decided the strategy was
determined. The strategy itself could not take new phenomena
into account. If a new phenomenon was noticed by pilots, they
would have to go back to the researchers to construct a new
model.
Suppose, for example, the Germans had found some way to
use some kind of fake submarine as a decoy that would cause the
aircraft to reveal their presence by shooting at it. Once the
pilots noticed this, they would use their common sense to try
to minimize the effect of the decoys, and the operations researchers
would use their common sense to decide how to modify the mathematical
model to take the decoys into account.
A robot submarine chaser with common sense would have to
be able to take decoys and other new phenomena into account.
Some people have strong intuitions that this is impossible,
and this leads them to believe that AI is impossible.
This small example of what can be done has the following features.
1. A general formalism for describing the effects of actions
is discussed. It is a variant due to Vladimir Lifschitz (198xx) of the
situation calculus (McCarthy and Hayes 1969).
2. Specific facts concerning travel by airplane from one city
to another are given. The need for a flight to exist and to have
a ticket are made explicit preconditions.
3. Facts relevant for flying from Glasgow to Moscow via London
are mentioned, i.e. the flights are mentioned.
4. The circumscription formalism of (McCarthy 1980) and
(McCarthy 1986) is used to minimize certain predicates, i.e.
$precond$, $noninertial$, $causes$, $occurs$.
5. It can then be inferred (nonmonotonically) that flying
from Glasgow to London and then flying to Moscow results in being
in Moscow.
6. Facts giving the consequences of losing a ticket and
buying a ticket are given. They don't change the previous inference.
7. An assertion that the ticket is lost in London is added
to the previous facts. Now it can no longer be inferred that the
previous plan succeeds. However, it can be inferred that the
plan of flying to London, then buying a ticket and then flying to
Moscow does succeed.
This example shows that it is possible to make a formalism
that (1) can be used to infer that a certain plan will succeed, (2)
can no longer infer that the plan will succeed when an obstacle
is mentioned, (3) can be used to infer that a different plan that
overcomes the obstacle will succeed.
Some domains in which it is hoped to use expert systems
require this capability.
Here are the formulas.
% from GLASGO.SLI[E89,JMC]/2P/59L
{% suppresses vertical bars
\overfullrule=0pt
%
$$succeeds(a,s) ≡ (∀p)(precond(p,a) ⊃ holds(p,s))$$
%
$$succeeds(a,s) ∧ causes(a,p) ⊃ holds(p,result(a,s))$$
%
$$¬noninertial(p,a) ∧ holds(p,s) ⊃ holds(p,result(a,s))$$
%
$$occurs(e,s) ⊃ outcome s = outcome result(e,s)$$
%
$$∀e¬occurs(e,s) ⊃ outcome s = s$$
%
$$rr(a,s) = outcome result(e,s)$$
$$causes(fly(x,y),at y)$$
%
$$precond(at x,fly(x,y))$$
%
$$precond(hasticket,fly(x,y))$$
%
$$precond(existsflight(x,y),fly(x,y))$$
$$causes(loseticket, not hasticket)$$
%
$$causes(buyticket,hasticket)$$
%
$$holds(not p,s) ≡ ¬holds(p,s)$$
$$holds(at Glasgow,S0)$$
%
$$holds(hasticket,S0)$$
%
$$holds(existsflight(Glasgow,London),S0)$$
%
$$holds(existsflight(London,Moscow),S0)$$
%
$$circum(Facts;causes,precond,noninertial,occurs;holds)$$
%
We can show
%
$$\eqalign{at(Moscow,rr&(fly(London,Moscow),\cr
&rr(fly(Glasgow,London),S0))),\cr}$$
%
but not if we add
%
$$occurs(loseticket,result(fly(Glasgow,London),S0)).$$
%
However, in this case we can show
%
$$\eqalign{at(Moscow,rr&(fly(London,Moscow),\cr
&rr(buyticket,\cr
&\ \ rr(fly(Glasgow,London),S0)))).\cr}$$}
\smallskip\centerline{Copyright \copyright\ 1989\ by John McCarthy}
\smallskip\noindent{This draft of EXPERT[E89,JMC]\ TEXed on \jmcdate\ at \theTime}
%File originated on 19-Aug-89
\vfill\eject\end